In [16]:
from CSVtoSQLconverter import load_sql_engine
sqlEngine = load_sql_engine()
import pandas as pd
import numpy as np
# Provides better color palettes
import seaborn as sns
from pandas import DataFrame,Series
import matplotlib as mpl
import matplotlib.pyplot as plt
# Command to display the plots in the iPython Notebook
%matplotlib inline
import matplotlib.patches as mpatches
mpl.style.use('seaborn-whitegrid')
plt.style.use('seaborn-talk')
We have loaded in the SQL database the years 2010 to 2014. We can directly extract from the database the counts for each month of each year and this throughout this whole section. We use the Characteristics table since it each accident corresponds to one entry in this table. Furthermore this table contains a column for the data and time of the accident stored in the convenient datetime format.
In [37]:
PerMonth = pd.read_sql_query('''SELECT YEAR(datetime), MONTH(datetime), DAY(LAST_DAY(datetime)),
COUNT(`accident id`) FROM characteristics
GROUP BY YEAR(datetime), MONTH(datetime);''',
sqlEngine)
PerMonth.head()
Out[37]:
In [38]:
PerMonth.rename(columns={'YEAR(datetime)':'year','MONTH(datetime)':'month',
'DAY(LAST_DAY(datetime))':'number days',
'COUNT(`accident id`)':'accident count'},inplace=True)
PerMonth.head()
Out[38]:
In [52]:
PerYear = PerMonth.groupby(['year'],as_index=False).sum()
PerYear
Out[52]:
In [41]:
g = sns.factorplot(x="year",y='accident count',
data=PerYear, kind='bar', size=5, aspect=2.0)
It is pleasing to see that the number of accidents has been overall decreasing since 2010, although 2014 is slightly higher than 2013.
In [43]:
PerMonthNorm = PerMonth.copy()
PerMonthNorm['per day'] = PerMonthNorm['accident count'] / PerMonthNorm['number days']
PerMonthNorm.head()
Out[43]:
In [46]:
sns.factorplot(x='month', y='per day', hue="year", data=PerMonthNorm, size=5,aspect=2.0)
Out[46]:
The 5 years we have follow the same trends overall. We can safely average over them.
In [53]:
PerMonthMean = PerMonth.groupby('month', as_index=False).mean()
PerMonthMean['per day'] = PerMonthMean['accident count'] / PerMonthMean['number days']
PerMonthMean
Out[53]:
In [54]:
sns.factorplot(x='month', y='per day', data=PerMonthMean, size=5,aspect=2.0)
Out[54]:
The dip in July and August can be attibuted to the holiday season. August is the most important month for holidays in France. So despite the long distances travelled by some to reach their vacation spot, the number of accidents decreases during the holidays.
In [ ]: